Sequential routing framework: Fully capsule network-based speech recognition

نویسندگان

چکیده

Capsule networks (CapsNets) have recently gotten attention as a novel neural architecture. This paper presents the sequential routing framework which we believe is first method to adapt CapsNet-only structure sequence-to-sequence recognition. Input sequences are capsulized then sliced by window size. Each slice classified label at corresponding time through iterative mechanisms. Afterwards, losses computed connectionist temporal classification (CTC). During routing, required number of parameters can be controlled size regardless length sharing learnable weights across slices. We additionally propose dynamic algorithm replace traditional routing. The proposed technique minimize decoding speed degradation caused iterations since it operate in non-iterative manner without dropping accuracy. achieves 1.1% lower word error rate 16.9% on Wall Street Journal corpus compared bidirectional long short-term memory-based CTC networks. On TIMIT corpus, attains 0.7% phone 17.5% convolutional network-based (Zhang et al., 2016).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A segmental framework for fully-unsupervised large-vocabulary speech recognition

Zero-resource speech technology is a growing research area that aims to develop methods for speech processing in the absence of transcriptions, lexicons, or language modelling text. Early systems focused on identifying isolated recurring terms in a corpus, while more recent full-coverage systems attempt to completely segment and cluster the audio into word-like units—effectively performing unsu...

متن کامل

Fully adaptive SVD-based noise removal for robust speech recognition

This paper presents a new approach to improve the robustness of large vocabulary continuous speech recognition. The proposed technique { based on Singular Value Decomposition (SVD) { originates from classical signal enhancement, but it is adapted to the speci c requirements imposed by the speech recognition process. Additive noise reduction is obtained by altering the singular value spectrum of...

متن کامل

Time-Warping Network: A Hybrid Framework for Speech Recognition

Enrico Bocchieri Recently. much interest has been generated regarding speech recognition systems based on Hidden Markov Models (HMMs) and neural network (NN) hybrids. Such systems attempt to combine the best features of both models: the temporal structure of HMMs and the discriminative power of neural networks. In this work we define a time-warping (1W) neuron that extends the operation of the ...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

Noise adaptive speech recognition based on sequential noise parameter estimation

In this paper, a noise adaptive speech recognition approach is proposed for recognizing speech which is corrupted by additive non-stationary background noise. The approach sequentially estimates noise parameters, through which a nonlinear parametric function adapts mean vectors of acoustic models. In the estimation process, posterior probability of state sequence given observation sequence and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computer Speech & Language

سال: 2021

ISSN: ['1095-8363', '0885-2308']

DOI: https://doi.org/10.1016/j.csl.2021.101228